#### APPENDIX A

## Darimics

Parallel versus Sequential Architecture Parallel Image Computation System







# Parallel versus sequential

- neighboring cells in a TDM fashion without MPP SIMDs like Parimics' IPE can access conflict or contention.
- to go through a NorthBridge with an arbiter to Sequential – even SMP – architectures need access common (shared) memory.
- well as bank select. Only one entity can drive Memory is accessed using RAS and CAS as the address signals at any time, and that is the NorthBridge.





## Slide Show explanation

- The following slide show explains accesses of both an MPP SIMD to neighbors and an SMP into DRAM.
- The NorthBridge contains an arbiter and the **JRAM** controller.
- Arrows point in the direction of signal flow, be it address or data.
- address signals, dashed arrows indicate data Dotted arrows indicate read requests or signals.

#### Cycle 0a

Sequential: Read Request CPUs to NorthBridge

Parallel: Read Request 0



Atomic Atomic Processing Progressing Engine Engine

......

Read request

---**→** Data



#### Cycle 0b

Sequential: RAS and CAS from NorthBridge to DRAM

Parallel: Data 0



Engine

..........▶ Read request

Processing Engine

Atomic

---- Data

Atomic Processing -Engine

Processing Engine

Atomic

**Atomic** 

Processing Engine

#### Cycle 1a

Sequential: Data out of DRAM cell 0



Atomic

Atomic

Atomic

Processing Engine

Processing Engine

......

Read request

---- Data

Parallel: Read Request 1

Processing Engine

Processing

Engine

Atomic

Atomic

Processing Atomic

Processing

Atomic

Engine

Engine



#### Cycle 1b

Sequential: Data out of NorthBridge into CPU 0

Parallel: Data 1

Processing Engine

Processing Engine

Processing Engine

Atomic

Atomic

Atomic

Processing

Processing

Processing

Engine

Atomic

Atomic

Engine

Atomic

Engine



Processing Engine

Processing Engine

Engine

Processing

Atomic

Read request

----**→** Data

Atomic

#### Cycle 2a

Sequential: Read Request CPUs to NorthBridge



rocessing Engine

Processing

Atomic

Engine

Processing

Engine

Atomic

Processing Engine

Processing Engine ..........▶ Read request

---+ Data

Atomic

Atomic.

Parallel: Read Request 2



#### Cycle 2b

Sequential: RAS and CAS from NorthBridge to DRAM





.....► Read request

---+ Data



#### Cycle 3a

Sequential: Data out of DRAM cell 1

Parallel: Read Request 3

Processing Engine

Processing Engine

Processing

Atomic

Engine

Atomic

Atomic



Atomic: Processing Engine

Atomic Processing

Engine

Processing F

Atomic

..........▶ Read request

---- Data

Processing Engine

Processing Engine

Processing

Engine

Atomic

Atomic

Atomic

e all common productions.



#### **Cycle 3b**

Sequential: Data out of NorthBridge into CPU 1



Parallel: Data 3

Atomic

Processing

+

Processing

Engine

Atomic

Atomic

Engine

← - | Processing |← - |

Engine

Processing Engine

Atomic

**Atomic** 

Processing -Engine

Processing

Atomic

Engine

......► Read request

---+ Data



## Parallel: Read Request 4



·······► Read request

---+ Data

Data

**+**---

#### Cycle 4a







#### Cycle 4b

Sequential: RAS and CAS from NorthBridge to DRAM

Parallel: Data 4

Atomic

Engine

Processing Engine Atomic

Processing

Atomic

Engine

Atomic

Atomic

Engine

Processing Engine

Processing

Engine

Atomic

Atomic

Atomic Processing Engine

Processing Engine

Atomic

.....► Read request

---→ Data





#### Cycle 5a

Sequential: Data out of DRAM cell 2



Atomic Processing Engine

Atomic as Processing: Engine

Processing

Engine

Atomic

......► Read request

---**→** Data

Processing Engine

Processing -Engine

Processing

**Atomic** 

Engine

Atomic

Atômic

Processing Engine

Processing Engine

Processing

Atomic

Engine

Atomic

Parallel: Read Request 5

#### Cycle 5b

Sequential: Data out of NorthBridge into CPU 2



Atomic Processing Engine

Processing Engine

Atomic

.....► Read request

---**→** Data

Processing

Processing

Atomic

Engine

Engine

Atomic

Processing Engine

Processing

Engine

Atomic





#### Cycle 6a

Sequential: Read Request CPUs to NorthBridge

Parallel: Read Request 6



Processing Engine

Processing Engine

Processing

Engine

Atomic

Atomic

Atomic

Processing

Processing Engine

Processing Engine

Atomic

Atomic

Atomic

Engine

Processing

Processing

Processing

Atomic

Engine

Engine

.....► Read request

---→ Data

Atomic

Atomic

Engine



## Cycle 6b

Sequential: RAS and CAS from NorthBridge to DRAM

Parallel: Data 6

Atomic Processing Engine

Processing

Engine

Atomic

Atomic Processing

Processing

Atomic

Engine

Engine



Processing Engine

Processing Engine ......► Read request

---+ Data

Atomic



### Cycle 7a

Sequential: Data out of DRAM cell 3

Parallel: Read Request 7



Processing

Atomic Processing

Processing

Engine

Atomic

Atomic

Engine

Engine

......► Read request

--- → Data

Atomic Processing

Atomic

Engine

Processing ◄-----Engine

Processing

Engine

Atomic

Engine

Processing Engine Atomic

> Processing Engine

Atomic



Cycle 7b

Parallel: Data 7





#### **Cycle 8a**



Processing -Engine

Processing Engine

Atomic

Atomic

Processing Engine Atomic

Processing

Atomic

Engine

Processing Engine

Processing

Atomic

Engine

..........▶ Read request

---**→** Data



#### Parallel: Data 8 (int.)



### Cycle 8b





.....► Read request

---**→** Data

Data



#### Cycle 9a

Sequential: Data out of DRAM cell 4

Parallel: Processing

Atomic Processing Engine

Processing

Engine

Atomic



Atomic Processing Engine

> Processing Engine

Atomic

......► Read request

---- Data

Processing Engine

Processing

Atomic

Engine



#### Cycle 9b

Sequential: Data out of NorthBridge into CPU 0

Parallel: Processing



Atomic

Atomic Processing Engine\*

Processing

Atomic

Engine

......► Read request

---- Data

Engine

Processing Engine

Processing -Engine

Atomic

Atomic

Atomic

Atomic Processing Engine

Processing Engine



## Conclusion

- processors are center processors of their own cluster, the MPP At this point the center processor of the MPP SIMD cluster has own. The new cell content is in the center processor. Since all polled all of its 8 neighbors and processed their data and its SIMD is done.
- efficiently since mostly contention forced the processors to run The sequential architecture has finished polling 5 out of all - in this case 9 - cells, and the processing power not not used
- On a quad VGA resolution with 16 IPEs, our parallel architecture would have finished the task, whereas a sequential processing architecture had completed 5 pixels out of 1280 \* 960 =
- bound in image processing applications, SMP systems are even Uniprocessor CPU or DSP based architectures are memory more memory bound. They are not compute bound.